Valency and Word Order in Czech ― A Corpus Probe

نویسندگان

  • Katerina Rysova
  • Jirí Mírovský
چکیده

We present a part of broader research on word order aiming at finding factors influencing word order in Czech (i.e. in an inflectional language) and their intensity. The main aim of the paper is to test a hypothesis that obligatory adverbials (in terms of the valency) follow the non-obligatory (i.e. optional) ones in the surface word order. The determined hypothesis was tested by creating a list of features for the decision trees algorithm and by searching in data of the Prague Dependency Treebank using the search tool PML Tree Query. Apart from the valency, our experiment also evaluates importance of several other features, such as argument length and deep syntactic function. Neither of the used methods has proved the given hypothesis but according to the results, there are several other features that influence word order of contextually non-bound free modifiers of a verb in Czech, namely position of the sentence in the text, form and length of the verb modifiers (the whole subtrees), and the semantic dependency relation (functor) of the modifiers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Valency Lexicon of Czech Verbs VALLEX: Recent Experiments with Frame Disambiguation

VALLEX is a linguistically annotated lexicon aiming at a description of syntactic information which is supposed to be useful for NLP. The lexicon contains roughly 2500 manually annotated Czech verbs with over 6000 valency frames (summer 2005). In this paper we introduce VALLEX and describe an experiment where VALLEX frames were assigned to 10,000 corpus instances of 100 Czech verbs – the pairwi...

متن کامل

An Analysis of Annotation of Verb-Noun Idiomatic Combinations in a Parallel Dependency Corpus

Valency in the PDTs For Czech PDT-Vallex http://ufal.mff.cuni.cz/lindat/PDT-Vallex.html For English EngVallex: http://ufal.mff.cuni.cz/lindat/EngVallex.html PDTs (ID -> val. frame) LEXICONS to give: ACT(.1) PAT(.4) ADDR(.3) Valency and MWEs Valency: ability of words to combine themselves with other lexical units FGD Valency theory MWEs: “idiosyncratic interpretations that cross word boundaries...

متن کامل

VALEVAL: Testing Vallex Consistency and Experimenting with Word-Frame Disambiguation

VALLEX is a valency lexicon of Czech verbs. We briefly introduce VALLEX and then describe and evaluate the VALEVAL experiment: annotation of 10256 corpus instances of 109 Czech verbs with valency frames. The inter-annotator agreement of three parallel annotations ranges from 61% to 74% and κ from 0.52 to 0.62. More than 8000 sentences are now available as the “golden VALEVAL” for word-sense dis...

متن کامل

Towards a Corpus-based Valency Lexicon of Czech Nouns

Corpus-based Valency Lexicon of Czech Nouns is a starting project picking up the threads of our previous work on nominal valency. It builds upon solid theoretical foundations of the theory of valency developed within the Functional Generative Description. In this paper, we describe the ways of treating valency of nouns in a modern corpus-based lexicon, available as machine readable data in a fo...

متن کامل

Legal Terms and Word Sketches: A Case Study

In this paper we describe an approach to the semiautomatic identification of legal terms in Czech texts. Our general goal is to offer supplementary tools for building dictionary of Czech law terms. At first we used the VaDis partial parser for recognition of the complex nominal constructions in a legal text – the current version of the Penal Code of the Czech Republic. Headwords of the recogniz...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014